Efficient multivariate entropy estimation via k-nearest neighbour distances

نویسندگان

  • Thomas B. Berrett
  • Richard J. Samworth
  • Ming Yuan
چکیده

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient in the sense of achieving the local asymptotic minimax lower bound. To this end, we initially study a generalisation of the estimator originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample of n independent and identically distributed random vectors in Rd. When d ≤ 3 and provided k/ log n→∞ (as well as other regularity conditions), we show that the estimator is efficient; on the other hand, when d ≥ 4, a non-trivial bias precludes its efficiency regardless of the choice of k. This motivates us to consider a new entropy estimator, formed as a weighted average of Kozachenko–Leonenko estimators for different values of k. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness. In addition to the new estimator proposed and theoretical understanding provided, our results also have other methodological implications; in particular, they motivate the prewhitening of the data before applying the ∗Research supported by a Ph.D. scholarship from the SIMS fund. †Research supported by an EPSRC Early Career Fellowship and a Philip Leverhulme prize. ‡Research supported by NSF FRG Grant DMS-1265202 and NIH Grant 1-U54AI117924-01. 1 ar X iv :1 60 6. 00 30 4v 2 [ m at h. ST ] 1 4 Ju l 2 01 6 estimator and facilitate the construction of asymptotically valid confidence intervals of asymptotically minimal width.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of k-Nearest Neighbor Distances with Application to Entropy Estimation

Estimating entropy and mutual information consistently is important for many machine learning applications. The Kozachenko-Leonenko (KL) estimator (Kozachenko & Leonenko, 1987) is a widely used nonparametric estimator for the entropy of multivariate continuous random variables, as well as the basis of the mutual information estimator of Kraskov et al. (2004), perhaps the most widely used estima...

متن کامل

Nonparametric independence testing via mutual information

We propose a test of independence of two multivariate random vectors, given a sample from the underlying population. Our approach, which we call MINT, is based on the estimation of mutual information, whose decomposition into joint and marginal entropies facilitates the use of recently-developed efficient entropy estimators derived from nearest neighbour distances. The proposed critical values,...

متن کامل

An efficient approximation-elimination algorithm for fast nearest-neighbour search based on a spherical distance coordinate formulation

Ramasubramanian, V. and K.K. Paliwal, An efficient approximation-elimination algorithm for fast nearest-neighbour search based on a spherical distance coordinate formulation, Pattern Recognition Letters 13 (1992) 471-480. An efficient approximation-elimination search algorithm for fast nearest-neighbour search is proposed based on a spherical distance coordinate formuTation, where a vector in K...

متن کامل

Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions

In molecular sciences, the estimation of entropies of molecules is important for the understanding of many chemical and biological processes. Motivated by these applications, we consider the problem of estimating the entropies of circular random vectors and introduce non-parametric estimators based on circular distances between n sample points and their k th nearest neighbors (NN), where k (≤ n...

متن کامل

High-Dimensional Entropy Estimation for Finite Accuracy Data: R-NN Entropy Estimator

We address the problem of entropy estimation for high-dimensional finite-accuracy data. Our main application is evaluating high-order mutual information image similarity criteria for multimodal image registration. The basis of our method is an estimator based on k-th nearest neighbor (NN) distances, modified so that only distances greater than some constant R are evaluated. This modification re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016